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Item response data were"^ obtained f rom large samples 
of students in Grades K-community college, taking th^ following 
tests: Stanford Early School Achievement Test, Stanford Achievement 
Test, Stanford Test of Academic Skills, Stanford Diagnostic leading 
Test, and Stanford Diagnostic Mathematical Test. Data were classified 
as fitting or non-fitting the Basch model, according to mean square 
fit or adjusted mean square fit statistics. Non-fitting items were 
examined for consistencies in item content or format. Results 
indicated the following: (1) high -percentages of spelling, reading, 
and mathematics items in all tests analyzed fit the Easch Model: (2) 
"prior notions of likely fit" do not include specific types of item 
content or format: (3) items measuring knowledge of specific content , 
say not fit if the item content is not always taught ox does not 
follow a regular patterti of instruction at particular grades and 
times of year- (BL) 
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. The Rasch «odel has beex; used in' a variety of situations for item analysis/ 
• equating purposes. The model ha^ been shown to be apj)ropriate for such 
. . ^different types of tests as achievement tests in reaLg, mathemati^/ and 
other content aieaa; diagnostic tests of school-related content ar^ ,, 
criterion-referenced reading tests; intelligence or/ptitude tesCf^^ 
writing tests (Rent, ^d Rentz, 1978) . since the 4eal of the Rasch Model 
is largely due to its characteristics of "item-free" person measurement and 
. -person-free" item measurement, and. to its efficient means of handling, such 
major test development phases as equating and scaling/ it appears that the 
K^ch Model will be used more and more as a test developi^ent and test 
analysis model. . 

The use of the model for equating purposes requires that the items 
used for equating "fit" the model. Although tj.e model appears to be , 
robust enough to tolerate some degree of departure from its assumptions 
■ (Rentz and Ridenour, ' 1978) ,\t. would be helpful to know beforehand ^ich 
types of items best fit the assumptions of the model. We know about the - . 
item characteristics that generally cause a lack of fit to the model. 
Items with extreme discrimination values generally appear as non-fitting 
•items, for exa^ie. When item discrimination values are known, this informa-' 
tion can help^ the test developer choose appropriate items -for equating purposes. 
Items that for one reason or another lead to guessing on the part of / 
e^minees also tend not to fit the Rasch Model. This will often be the case 
for very difficult items on anXachievement test that measure concepts that have 
not yet been taught to the examinees. Items that are confusing, ambiguous, 
or have more- than one correct answer also tend not to fit the model. 



Achieveiaerit test items iiy tend not to fit^^^^ 
. in certain areas has _not been eontiri^us and/or if the sa«5>le of examinees • 
analyzed has not been exposed to particular instruction. Achievement test^ 

^''^It from an analysis of texiaxx>ks aiid 
certain continuity and progression in= instruction. Since this continuity 

may not always reflect actual classroom instruction, -items measuring ' 
content that is not consistently, taught' may show up as non-fitting. 

Another major reason that some items show- up as non-fitting is that 
they measure a different skill'or content area than th^r^st of the items 
in the test. A 'major assun^Jtion. of the :Rasch Model is that the items being 
analyzed measure a unidimensional trait. According to Rentz and Rentz (1978) , 
"There are no separate adequate tests' of ' the unidimensionality assuni^tion . 
v*ich are really adequate. . . There is no clear definition of .unidimensionality 
^en you -go beyond the mathematical .definition'." This does not mean, however, 
that test developers have no criteria to review in order to ev^uate the 
^nidiWsionalit^ of ^ set of items. Rather, Rentz and Rentz. (1978) recommend 
that the tighter the definition of content, the easier the items, and the more 
care taken in writing the items, the better the chances are of meeting, the 
unidimensiorklity assumption.' Rentz and Retnz go on to say that "prior; notioiT" 
of likely fit would contribute to efficiency in using Rasch Model methodology." 

Althou^ test developers using Rasch Model methods may have to include 
"non-fitting" items or items in a set that appear to measure more than one 
trait due to considerations of curriculum. coverage, it would be useful for 
test developers to kn6w in advance which types of items may not be ideal for 
equating purposes. 
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.: - In this study, non-fitting iteins from a variety of sources werk ■ .. : - 
.^reviewed and aialyzed. Non-fittihg items within tests apd across test: ; , 

forms and levels were examined to see if ^ consistencies in item •content-'":-' 
or format were apparent. Could any generalizations be made about types 
of test items that were likely not to fit the Rasch Mddel? 



V. 

METHOD 



it^ from all levels, forms, a^d subtests of the following tests were 
Analyzed by using Wright's Meslmax program that generates Wious Rasch 



statistics: 



• • ■ f^^y school Achievement Test (SESAT) , 1969' edition 

, .Stanford Achievement Test (SAT) , 1973 edition - 
•■ . -Stanford Test o£ Acadend:c_^kiils (TASK), 1973 edition - 

Stanford-.Diagnostic/Reading .Test (SDKrr, 1976 edition ' 
Stanford Diagnostic Mathematics Test (SDMT) , 1976 edition 

SESAT, SAT,^and TS^SK are achievement tests, that cover a grade range of 
. Kindergarten through twelfth ^de in a variety of content afeas; SDRT^ 
and SDMT are' diagnostic tests i.at cover a range of fizjst , grade thic^h* ; ' 
community college. - 

Item response data from large samples.^of ' students taking these tests 
. «ere nsed: In general^ these data were taken' from standardization or other 
large-scale research programs. .* * . . 

^The m^.s<?uare fif (msf) statistic, a'7(^f it statistic, was used to 
identify non-fitting items. This statistic is arrived at by determining the 
e^cted proportion pf examinees at each ability level who should correctly 
answer an ite^ according to the model and conparing that with actual 
- proportions. The MSFs for saaple sizes over 150.0 were then adjusted. ■ 
(Adj^ted MSP = MSF X 1500)because this, statistic tends to inflate as ' sample 
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. size increases (I^ntz and Ridenour, 1978) . ' In addition,- this adjustment . 
facilitates, cons^risons over different, sai^les and .analyses. / WWt^ 
- adjusted MSPs (AMS.) ^reaiv than two were W classified as' ^^n^^^^^^ 
for the -purposes of this study. ' - , ' .. ' 

:The .items in these teste w^e^ often analyzed iii> several ways . For ' 
• exan5>le, Reading Cc^rehension items were anaiy^zed^as'part of an' individual 
. . STitest; Readins Con^-rhension, and as part! of a larger total, a Reading 
aggregate. .TO facilitate c^ ^ 
.su^test^analyses,: (an • exception is the' sat Matheinatics teste, since items 
in t^e tiree SAT ^thematics s^teste were only analyzed as part of a l^ger 
ag^egate/ -Total Mathematics, this aggregate analysis will be reported here. 

•Non-fitting items were then reviewed as a group and in relation to 
fitting items, for possible consistencies in item format, content, and/or . 
skill being tested. Although all subteste were analy:.ed in this way,"oiay 
subteste v*ere clear patterns emerged within teste or across test forms 
and/or levels are presented here. 



RESULTS AND DISCUSSION 
in general, high percenteges". of items in all teste analyzed, fit the " . 
Kasch Model, usiiig a MSF or AMSP less than^two as a criterion for fit ' ^" 
{see Tables 1 through 6^,. | 

In most cases, individual items appeared to be nin-fitting for 
individual, reasons. For exait^^le, a Reading comprehension item appeared not 
to fit becaaise it required some math computetion to arrive at the answer; " 
a Listening Con5,rehension item appeared not to fit because correctly answering 
this item seemed to depend more on looking, at the accompa£ying picture thazf 
on Ustening- to the passage that was ictated. m cases such'as these, the 



, non-fitting ite»s. -rather tlearly stood out .as be&g diff^^t in some / - 
(iistinct way from, most of the fitting items. '-. . ^ 

However, in some ^^ases, specific item l^es tended not to^ fit, 
- r^gardiess of test form, level, or- fype of 't^st UU., achievement or 
' diagnostic) . The consistencies- found in several ^ 
will be. presented. , 

Spellincr . ^ . ' ' , ^ 

The .^ pf^ellii,g items to the Rasi. Model" was generally very good! ' 
The analysis Of the Spelling subtest 'of SAt; sWd percentages .f fitting items 
ranging from 83% to 100% for Primary III through Advanced levels,. Forms A a^ B 
rs^e Tables 2 ^d 3). The Pri^ ir level of ' the test, however, has 'a 
different format for Spelling than the >>rimary in through Advanced levels, 
and, at this level., the ^ercenta'ge of fitting items was lower! On Pri^ n- 
Form A,. 67.4% of the items fit, and on. Form B, 74.4% of^the items fit. 

A Closer examination of the'f^^^ ^^^^ " 

Spelling item3 appear in this format:' ., - 





R 


W 


DK 1 


lat 


o 


o 


J 



students must identify ;^ether" the given word is spelled righ^ or wrong, or 
, choose 'dk if they don! t know. . ' ' ' ' 

Primary in through J^dvanced Spelling items; appear in this format: 



frightinNL^ 
generation ■ cojrthent " 



Students must choose the one i^co^ctly spelled word from four different" words. 
■ • At the Primary li level, it ^ear that l^^stinct skills may £>e being, 
measured, depending on ;*etiefj;the stimulus wor/is spelled correctly or ~ , 
incorrectly, Eighty-nine peirOent of the • non-fitting items were wo^ds'-piesented 



as' incorrect spellings; the fitting items- tended to be those that were 
co:n:ectly spelled. One might hypothesize, that ^.'fekin closer to. word 
recognition was being^ measured by the correctly spelledWds, while ■ 
the. incorrectly spelled words at tks level require a skill /that .more . ' 
Closely approximates tHat skill "required by the upper levels of thhest. 
Perhaps spelling skills are not taught at this level to^-the extent^that 
word recognition reading skills are .tau^t. ■■ - ' . • 

A -% analysis, of the .fitting and non-fitting' items.- at this level 
Shows Clearly that the relatiohs^P -betweei. item type and >f it to'- the model 
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Primary li Spelling, Form. A 
Correctly spelled words 
Incorrecitly spelled words 









14 



"](^= 14. 8- (p-^ .001) 



^ ^ , Primary II Spelling, Form B 

•Correctly spelled words 
Incorrectly spelled words' 





2 


L 13 


10 



"j( = 6.08 (p<^ .05) 



Reading Comprehension 

The fit of reading items to the model was also generally good. The 
analysis of the Reading Con?,rehension subtest of SAT showed percentages of 
fitting items ranging from 81.4% to 97.-3% for Primary I through Advanceci. 
Forms A and B, 97% to 99% for TASK I and^Il Forms A and B, and 68.7% to 
91.7% for SDRT, Red through Blue-" level.. Forms ^ and B (see Tables 2 though 5) . 
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. Of ^tbe ..relatively few non-fitting ilteins, more apjjear to measure 

• iiif^ential coJ^^ension than literal comprehension. - (itens measuring 

9l<*al, ixnplicit, -contextual, and inferential meanings according to' . ' 

^ ; ptiblished itek objectives,. are classified as inferential items' fdr the 

purposes of this study; "items measuring explicit or literal mea^ngs are 

Classified here as literal items.) This is "prpbably due to the fact that 

■■ inferential items invite more guesting than Uteral items and that. 

inferential items- may hdve more of a tendency to be ambiguous than literal, 
items. 

Although'jf analyses showed no significant relationships, in every 
case but one (SDRT Blue, Form A) the percentage of fitting literal 

con^^rehension items is greater than the percentage of fitting inferential, .items 
(see Table 7) . . . ' - - 

Mathematics 

The fit of Mathematics items to the Rasch Model was very .high for SAT, 
all levels, Fozins A and B. Percentages of fitting items ranged from 
93.8% to 100% for SAT Prinary I through Advanced levels. Forms A and B (see 
Tables 2 and 3). SESAT I Mathematics had 71.4% fitting items, and SESAT II 
- Mathematics had 83 . 6% fitting items (see Table 1) . TASK I- and II , Forms A. and 
B, had percentages of items that fit ranging froin:7?%- to 94% (see Table 4).^ " 
SDMT had percentages of fitting items ranging from .85% to 100% for all levels 
and forms (see Table 6). . ' 

consistencies among non-fitting items" were -not easy t^ find. On the 
SESAT I Mathematics test, three of the four non-fitting, ite^s 'were .dictated ^ 
•word -problems that required con^^utation. ' ^tting items . tested a '^iety of math 
concepts but only one r^juired con?5utation. ■ This is a fairly clear exaii5,le " 
Of items that don't- fit because they measure a different. skill or because 



they measure something that hasn't been taught yet. 

•one consistency noted on several levels and- forms, of the tests studied 
involved items requiring knowledge of the metric system, 'zach of forms A and B 
of-SDMT levels Green, Brown and Blae contain three items that require knowledge 
of the metric system. Although the total number of metric items is small, 
61% (11) of the 18 metric items did not fit the ^model. This can be con^^ared 
with the generally high percentages of fitting SDlir items overall (see Table 6). 
on TASK, the one item per form and leVel requiring knowledge of the metric 
system did not fit the model, although high percentages' of all mathejnatics ■ ' 
items do fit the model at this level (^eeTablc4). Metric items were^generally 
"not tested on SAT Mathematics tests. 

• This- finding is again liJcely due to the fact that at the time these item 
response data were collected (early to -mid 1970 's) the metric s^st^m was H6t 
systematically taught and the san^Jle tested had not been uniformly exposed to 
. instruction in this area, it would be interesting to see if recent item 
response data on metric items still shows this pattern. 
Letters. and Sounds 

.. On this reading subtest of SESAT I, 75% of the items fi-f the model 
(see Table 1). On SESAT ri, 78% of the items fit (see Table 1). 

Several consistencies in item content for non-€itting items were noted. 
For exanple, items testing recognition of the letters "p" and "d" did not fit " 
, ^en,either level of the test' was 'analyzed, m addition, it^ testing the 
sound Of the letter "h" did not fit the model for either analysis. Oi^ SESAT II, 
a"){ analysis showed there to be a significant relationship between fit and 
type of initial sound tested (blend/initial letter) . since blends are nost 
often tau^t after initial letters, test items measuring blends may lead to 



inore guessing on the* part of examinees. 



Blends 
Non-blends 



SESAT II, Sounds and Letters 
Pit 
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15 " • 





6.54 (p <.05) 



This study,indicates that, in general, "prior notions of likely fit" " 
do-not include specific types of item content and/or format; varied types of 
content and item types do fit the. model well. ' ' 

However, the study does reinforce the idea that itens measuring ■ ' . 
knowledge of specific content may not fit the Rasch Model if the itemcontent 
is. not always taught (e.g. , metric items) or dpes not fbllow. a regular patted 
Of instruction at particular grades and times of year (e.g.-, spelUng sJcilis 
at second grade level , sounds' of blends \t first grade level). ^ 
An analysis such as th^ can offer some insist into the skills beiiig 
. tested at various levels (6.g. word reading vs. spelling skills at lower" ' , 
grade Tevels) and into the differences between various^es of items.'- ; , ^ 
(e.g. literal and inferential con5>rehension items), .it seei^s, however, thit 
individual test de^^<^^s.^ be3t ^o u^e their own judgment as to what ^ 
^^'of items should be: tested together.^' if Jthe: items We similar to' the - 
vast majority of items analyz^?here,: they will. fit the model regardless of . 
specxf xc xtem content or fonnat>. i^v ^ > 



Table 1. Percentage of SESAt I and II items,,,that fit 



the Rasch Model 




' Subtest 



Enviromnent 
Mathematics 
Aqral Con?)rehension 
' Letters and Sounds • 
Word Reading 
Sentence Reading 



78.6 
71.4 
75.0 
78 .'e 



79.5 

• 8 Jv6 ' 
77.8 ■ 

:^8.o ^ 

82.8 
64.1 



/ 



J 



£ 
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Table 2. Percentage of SAT 73 Form A items that fit the Rasch Model 



Subtest 


Primary I 


Primary II 


Primary- III 


Int. I 


Int. II 


Advanced 


Vocabul2u:y 


78.4 


81.1 


88.9 


94.0 


96.0 


90.0 


Read. Comp. 


93.1 


91.4 


92.9 


84.7 


90.1 


91.9 


Word Study 
Skills 


85.0 


90 8 




98.2 


96.0 


• 


Total Math 


93.8 


^7 n 


y / .9 


98. '2 


99.2 


- 

99.2 


Spelling 




67 4 




100 . 0 


96.7 


93.3* 


Language 






89.1 


98.7 


98.8 


93.7 


Soc. Sci. 




92.6*. ^ 


90.9 


96.7 


94.4 


98.3 


Science 




88.9 

■ill 


95.2 


91.7 


95.0 


98.3 


Listening 


80.8 


88.0 


94.0 


98.0 • 


96.0 





/ 
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Table 3.^ Percentage of SAT '73 Form B items that fit the Rasch Model 
Subtest Primary I 



Prijnary II Primary III mt. I mt. II Advanced 



Vocabulary , 54.1 

Read. Caap. . 95.4 

Word Study 

SJcills 80.0 

Total Math 96.9 

Spelling 

Language 

Soc. Sci. 

Science — 

Listening 92.3 



81.1. 
92.5 



82 .'2 
81.4 



90.0 
88.9 



94.0 
88". 7 



86.0 
97.3 



90.8 


. 83.6 


94.4 


94.0- 




97.^0. 


95.8 


lOO.O • 




" 99.2 


74.4 . 


83.0 


98.0 


95.0 


96.7 


MM • - A 


85.5 


"94.9 


95.a 


91.1 


81.5 ' 


86.4 


95.0 


92.6 


90.0 


92.6 


83.3 


91.7 


98.3 


98^3 


90.0 


90.0 


92.0 


98.0 
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Table 4. Percentage of TASK '73 items that fit th^sch Model 



Subtest 



5K I 



Form A Form B 



Level 



TASK II 
Form A Form B 



Mathematics 



Table 5. Percentage of SDRT '76 items that do not fit the^Kasch Model 



, ■ Level 

Red Green / Brown 



Blue 



Subtest FormA FormB Form A Form B Form A . Form B I -itepn A_--.r . 
Auditory 

Vocab. 69.5 77.8 62.5 82.5 / 77.5 72.5 , -- 



_Auditory • 

Discrim. 65^0 '75.0 63.9 



88.9 



Phonetic .. ^ - ' " 

Analysis 62.5 12.5 72.2 " 77.8 " 11 .Q . 66.7- 90.0 



Structural ~ . ♦ - 

•Analysis—' — 81.'7^^ 90.0 - 87.0 ■ 77.8 

Word ' ' \ . 

-^Meaning — » — ' " 

Word Parts — — ^ ^ 

Read. Con$.68.7 68.7 70.0 - 78.3 91.7 , 78.3 

Word . _ 

Reading 81.0 88.1 

Scan./SIcim.- — — ^ 



91.7 

lOO.O 
83.3 
90.0 



97.0 



Table 6. 



Percentage of SDMT '76 items that fit the Rasch" Model 



Subtest 



Number System 



'Level 

' . ' Red Green Brown Blue • 

Form A Form B Form A Form B. Form A Form' B Form A Form B 



& Numeration 


63 


.3 


63.3 


75 


.0 


72, 


.2 


50, 


.0 


69.4 


; Confutation 


84, 


.8 


66.7 


, 85. 


.4 


72. 


.9 


77. 


.1 


75.0 


Applications 


90. 


■ o' 


- 83.3 


70. 


.0 


73. 


,3 


57. 


,6 


57.6 



.17 



Table 7,. Percentages of Reading Comprehension Items th^t fit tte Rasch Model 



?rest 



Literal 



Inferential 



SAT^ Intermediate I 
Form A . . 
Form d 

Intermediate II 
Form A 
. Form .B * . 

Advanced^ 
Form A 
Form Bu 

SDRT^Green 

Form A : 
Form B 

SDRT^Bipown- 

• * Form A * . - 
Form B * 

SDRT^Bl^ie 

Form A 



92.5 

95i5 



95.7 
, 91.7- 



95.0 
100.0 



80.0 
86.7 



96.7 
80.0 

90.0 



80.0 
75.0 



87.5 
87.2 



90.7 
96.2 



60.0 
70.0 



86.7 
76.7 



.90.0 



. s 



- I,- 
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^ Table 8. Sample Sizes of Rasch Analyses 



Test 



Level 



Form 



Sample Size (Approximate) 



SAT 



SDRT 



SDMT 



SESAT I * 
SESAT II 
Primary' I 

Primary II 



Primary JII 



Int. I 
Int. II 
' Adv. 

TASK I , 
^ TASK II 

Red 
.' jjGreen • ■ 
^ 'Brown 

Red 

• Green 

Brown 

' Blu6^ 
— 



A 

A 

A " 

B 

A 

B 

A 

B 

A 

B 

A 

B 

A 

B 

A 

B 

A* 

B 

A 
B 
A 
B 
.A 
B 



A 

A 
B ' 
' A 

.B 
A 
B 



500 
800 
3600 
330'0 
4100, 
3700 
4200 
3400 
4500 
3800 . 
8500 
6300 
806d 
7500 
10000 
10000 
; 4300 ^ 
1800 

150D^ 
' 1400 
'1600 
1500^ 
900^ 
1500 

* 1500 
^ 1600 
1500 

leqp * 
i7ob 

^2000 
• 1500 
1600 
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